Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Spectrum-based de novo repeat detection in genomic sequences.

Identifieur interne : 002949 ( Main/Exploration ); précédent : 002948; suivant : 002950

Spectrum-based de novo repeat detection in genomic sequences.

Auteurs : Huy Hoang Do [Singapour] ; Kwok Pui Choi ; Franco P. Preparata ; Wing Kin Sung ; Louxin Zhang

Source :

RBID : pubmed:18549302

Descripteurs français

English descriptors

Abstract

A novel approach to the detection of genomic repeats is presented in this paper. The technique, dubbed SAGRI (Spectrum Assisted Genomic Repeat Identifier), is based on the spectrum (set of sequence k-mers, for some k) of the genomic sequence. Specifically, the genome is scanned twice. The first scan (FindHit) detects candidate pairs of repeat-segments, by effectively reconstructing portions of the Euler path of the (k-1)-mer graph of the genome only in correspondence with likely repeat sites. This process produces candidate repeat pairs, for which the location of the leftmost term is unknown. Candidate pairs are then subjected to validation in a second scan, in which the genome is labelled for hits in the (much smaller) spectrum of the repeat candidates: high hit density is taken as evidence of the location of the first segment of a repeat, and the pair of segments is then certified by pairwise alignment. The design parameters of the technique are selected on the basis of a careful probabilistic analysis (based on random sequences). SAGRI is compared with three leading repeat-finding tools on both synthetic and natural DNA sequences, and found to be uniformly superior in versatility (ability to detect repeats of different lengths) and accuracy (the central goal of repeat finding), while being quite competitive in speed. An executable program can be downloaded at http://sagri.comp.nus.edu.sg.

DOI: 10.1089/cmb.2008.0013
PubMed: 18549302


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Spectrum-based de novo repeat detection in genomic sequences.</title>
<author>
<name sortKey="Do, Huy Hoang" sort="Do, Huy Hoang" uniqKey="Do H" first="Huy Hoang" last="Do">Huy Hoang Do</name>
<affiliation wicri:level="4">
<nlm:affiliation>Department of Computer Science, National University of Singapore, Singapore.</nlm:affiliation>
<country xml:lang="fr">Singapour</country>
<wicri:regionArea>Department of Computer Science, National University of Singapore</wicri:regionArea>
<orgName type="university">Université nationale de Singapour</orgName>
</affiliation>
</author>
<author>
<name sortKey="Choi, Kwok Pui" sort="Choi, Kwok Pui" uniqKey="Choi K" first="Kwok Pui" last="Choi">Kwok Pui Choi</name>
</author>
<author>
<name sortKey="Preparata, Franco P" sort="Preparata, Franco P" uniqKey="Preparata F" first="Franco P" last="Preparata">Franco P. Preparata</name>
</author>
<author>
<name sortKey="Sung, Wing Kin" sort="Sung, Wing Kin" uniqKey="Sung W" first="Wing Kin" last="Sung">Wing Kin Sung</name>
</author>
<author>
<name sortKey="Zhang, Louxin" sort="Zhang, Louxin" uniqKey="Zhang L" first="Louxin" last="Zhang">Louxin Zhang</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2008">2008</date>
<idno type="RBID">pubmed:18549302</idno>
<idno type="pmid">18549302</idno>
<idno type="doi">10.1089/cmb.2008.0013</idno>
<idno type="wicri:Area/PubMed/Corpus">002103</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">002103</idno>
<idno type="wicri:Area/PubMed/Curation">002103</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">002103</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001F57</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001F57</idno>
<idno type="wicri:Area/Ncbi/Merge">000606</idno>
<idno type="wicri:Area/Ncbi/Curation">000606</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000606</idno>
<idno type="wicri:Area/Main/Merge">002975</idno>
<idno type="wicri:Area/Main/Curation">002949</idno>
<idno type="wicri:Area/Main/Exploration">002949</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Spectrum-based de novo repeat detection in genomic sequences.</title>
<author>
<name sortKey="Do, Huy Hoang" sort="Do, Huy Hoang" uniqKey="Do H" first="Huy Hoang" last="Do">Huy Hoang Do</name>
<affiliation wicri:level="4">
<nlm:affiliation>Department of Computer Science, National University of Singapore, Singapore.</nlm:affiliation>
<country xml:lang="fr">Singapour</country>
<wicri:regionArea>Department of Computer Science, National University of Singapore</wicri:regionArea>
<orgName type="university">Université nationale de Singapour</orgName>
</affiliation>
</author>
<author>
<name sortKey="Choi, Kwok Pui" sort="Choi, Kwok Pui" uniqKey="Choi K" first="Kwok Pui" last="Choi">Kwok Pui Choi</name>
</author>
<author>
<name sortKey="Preparata, Franco P" sort="Preparata, Franco P" uniqKey="Preparata F" first="Franco P" last="Preparata">Franco P. Preparata</name>
</author>
<author>
<name sortKey="Sung, Wing Kin" sort="Sung, Wing Kin" uniqKey="Sung W" first="Wing Kin" last="Sung">Wing Kin Sung</name>
</author>
<author>
<name sortKey="Zhang, Louxin" sort="Zhang, Louxin" uniqKey="Zhang L" first="Louxin" last="Zhang">Louxin Zhang</name>
</author>
</analytic>
<series>
<title level="j">Journal of computational biology : a journal of computational molecular cell biology</title>
<idno type="eISSN">1557-8666</idno>
<imprint>
<date when="2008" type="published">2008</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Genome, Human</term>
<term>Humans</term>
<term>Pattern Recognition, Automated</term>
<term>Probability</term>
<term>Repetitive Sequences, Nucleic Acid</term>
<term>Sequence Analysis, DNA (methods)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Génome humain</term>
<term>Humains</term>
<term>Probabilité</term>
<term>Reconnaissance automatique des formes</term>
<term>Séquences répétées d'acides nucléiques</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Genome, Human</term>
<term>Humans</term>
<term>Pattern Recognition, Automated</term>
<term>Probability</term>
<term>Repetitive Sequences, Nucleic Acid</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Génome humain</term>
<term>Humains</term>
<term>Probabilité</term>
<term>Reconnaissance automatique des formes</term>
<term>Séquences répétées d'acides nucléiques</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">A novel approach to the detection of genomic repeats is presented in this paper. The technique, dubbed SAGRI (Spectrum Assisted Genomic Repeat Identifier), is based on the spectrum (set of sequence k-mers, for some k) of the genomic sequence. Specifically, the genome is scanned twice. The first scan (FindHit) detects candidate pairs of repeat-segments, by effectively reconstructing portions of the Euler path of the (k-1)-mer graph of the genome only in correspondence with likely repeat sites. This process produces candidate repeat pairs, for which the location of the leftmost term is unknown. Candidate pairs are then subjected to validation in a second scan, in which the genome is labelled for hits in the (much smaller) spectrum of the repeat candidates: high hit density is taken as evidence of the location of the first segment of a repeat, and the pair of segments is then certified by pairwise alignment. The design parameters of the technique are selected on the basis of a careful probabilistic analysis (based on random sequences). SAGRI is compared with three leading repeat-finding tools on both synthetic and natural DNA sequences, and found to be uniformly superior in versatility (ability to detect repeats of different lengths) and accuracy (the central goal of repeat finding), while being quite competitive in speed. An executable program can be downloaded at http://sagri.comp.nus.edu.sg.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Singapour</li>
</country>
<orgName>
<li>Université nationale de Singapour</li>
</orgName>
</list>
<tree>
<noCountry>
<name sortKey="Choi, Kwok Pui" sort="Choi, Kwok Pui" uniqKey="Choi K" first="Kwok Pui" last="Choi">Kwok Pui Choi</name>
<name sortKey="Preparata, Franco P" sort="Preparata, Franco P" uniqKey="Preparata F" first="Franco P" last="Preparata">Franco P. Preparata</name>
<name sortKey="Sung, Wing Kin" sort="Sung, Wing Kin" uniqKey="Sung W" first="Wing Kin" last="Sung">Wing Kin Sung</name>
<name sortKey="Zhang, Louxin" sort="Zhang, Louxin" uniqKey="Zhang L" first="Louxin" last="Zhang">Louxin Zhang</name>
</noCountry>
<country name="Singapour">
<noRegion>
<name sortKey="Do, Huy Hoang" sort="Do, Huy Hoang" uniqKey="Do H" first="Huy Hoang" last="Do">Huy Hoang Do</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002949 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002949 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:18549302
   |texte=   Spectrum-based de novo repeat detection in genomic sequences.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:18549302" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021